Processing Dialectal Arabic: Exploiting Variability and Similarity to Overcome Challenges and Discover Opportunities
نویسنده
چکیده
We recently witnessed an exponential growth in dialectal Arabic usage in both textual data and speech recordings especially in social media. Processing such media is of great utility for all kinds of applications ranging from information extraction to social media analytics for political and commercial purposes to building decision support systems. Compared to other languages, Arabic, especially the informal variety, poses a significant challenge to natural language processing algorithms since it comprises multiple dialects, linguistic code switching, and a lack of standardized orthographies, to top its relatively complex morphology. Inherently, the problem of processing Arabic in the context of social media is the problem of how to handle resource poor languages. In this talk I will go over some of our insights to some of these problems and show how there is a silver lining where we can generalize some of our solutions to other low resource language contexts.
منابع مشابه
Elaborating on the opinion of medical and nursing students of the Kurdistan University of Medical Sciences: Challenges and opportunities of virtual learning in focus
Introduction: Universities have also used online courses as a tool to establish lifelong learning among students. Lifelong learning has become part of the way of life due to the dynamic nature of modern society. The community's demand for lifelong learning will be supported by the growth of online learning courses. Universities can reduce the cost of education providers by developing distance l...
متن کاملDALILA: The Dialectal Arabic Linguistic Learning Assistant
Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP). The number and sophistication of tools and datasets in DA are very limited in comparison to Modern Standard Arabic (MSA) and other languages. MSA tools do not effectively model DA which makes the direct use of MSA NLP tools for handling dialects impractical. This is particularly a challenge for the creation of...
متن کاملSEeSAW - Similarity Exploiting Storage for Accelerating Analytics Workflows
The key to successful deployment of big data solutions lies in the timely distillation of meaningful information. This is made difficult by the mismatch between volume and velocity of data at scale and challenges posed by disparate speeds of IO, CPU, memory and communication links of data storage and processing systems. Instead of viewing storage as a bottleneck in this pipeline, we believe tha...
متن کاملCODACT: Towards Identifying Orthographic Variants in Dialectal Arabic
Dialectal Arabic (DA) is the spoken vernacular for over 300M people worldwide. DA is emerging as the form of Arabic written in online communication: chats, emails, blogs, etc. However, most existing NLP tools for Arabic are designed for processing Modern Standard Arabic, a variety that is more formal and scripted. Apart from the genre variation that is a hindrance for any language processing, e...
متن کاملArabic Dialect Processing Tutorial
The existence of dialects for any language constitutes a challenge for NLP in general since it adds another set of variation dimensions from a known standard. The problem is particularly interesting and challenging in Arabic and its different dialects, where the diversion from the standard could, in some linguistic views, warrant a classification as different languages. This problem would not b...
متن کامل